Discarding Noise in an Automatically Acquired Lexicon of Support verb Constructions
نویسنده
چکیده
We applied data-driven methods to carry out automatic acquisition of Dutch prepositional support verb constructions (SVCs) in corpora (e.g., iets in de gaten houden (“keep an eye on something”)). This paper addresses the question whether linguistic diagnostics help to discard noise from thenbest lists and how to (semi-)automatically apply such linguistic diagnostics to parsed corpora. We show that some of the linguistic diagnostics proposed in Hollebrandse (1993) effectively identify SVCs and contribute a modest error rate decrease.
منابع مشابه
The Lexicon-Grammar of Italian Idioms
This paper presents the Lexicon-Grammar classification of Italian idioms that has been constructed on formal principles and, as such, can be exploited in information extraction. Among MWEs, idioms are those fixed constructions which are hard to automatically detect, given their syntactic flexibility and lexical variation. The syntactic properties of idioms have been formally represented and cod...
متن کاملDeverbal Nouns in Czech Light Verb Constructions
In this paper, we provide a well-founded description of Czech deverbal nouns in both nominal and verbal structures (light verb constructions), based on a complex interaction between the lexicon and the grammar. We show that light verb constructions result from a regular syntactic operation. We introduce two interlinked valency lexicons, NomVallex and VALLEX , demonstrating how to minimize the s...
متن کاملAutomatic translation of support verb constructions
M. Gross (1981) calls such verbs 'support verbs', and I shall adopt his terminologLv. These verbs exhibit many interesting properties which have been studied systematically for several French support verbs: faire (make), avoir (have), prendre (take), etre (be), etc. An examination of the results indicates that support verb,; must be taken into account in the parser and in the lexicon of a progr...
متن کاملHindi CCGbank: CCG Treebank from the Hindi Dependency Treebank
In this paper, we present an approach for automatically creating a Combinatory Categorial Grammar (CCG) treebank from a dependency treebank for the Subject-Object-Verb language Hindi. Rather than a direct conversion from dependency trees to CCG trees, we propose a two stage approach: a language independent generic algorithm first extracts a CCG lexicon from the dependency treebank. A determinis...
متن کاملSemi-automatic Building of Swedish Collocation Lexicon
This work focuses on semi-automatic extraction of verb-noun collocations from a corpus, performed to provide lexical evidence for the manual lexicographical processing of Support Verb Constructions (SVCs) in the Swedish-Czech Combinatorial Valency Lexicon of Predicate Nouns. Efficiency of pure manual extraction procedure is significantly improved by utilization of automatic statistical methods ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004